Expected value

In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or mean, or first moment) of a random variable is the integral of the random variable with respect to its probability measure.[1][2]

For discrete random variables this is equivalent to the probability-weighted sum of the possible values.

For continuous random variables with a density function it is the probability density-weighted integral of the possible values.

The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment.

The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is almost surely the limit of the sample mean as sample size grows to infinity. The value may not be expected in the general sense — the "expected value" itself may be unlikely or even impossible (such as having 2.5 children), just like the sample mean.

The expected value does not exist for some distributions with large "tails", such as the Cauchy distribution.[3]

It is possible to construct an expected value equal to the probability of an event by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.

Contents

History

The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, posed by a French nobleman chevalier de Méré. The problem was that of two players who want to finish a game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance each has of winning the game from that point. This problem was solved in 1654 by Blaise Pascal in his private correspondence with Pierre de Fermat, however the idea was not communicated to the broad scientific community. Three years later, in 1657, a Dutch mathematician Christiaan Huygens published a treatise (see Huygens (1657)) “De ratiociniis in ludo aleæ” on probability theory, which not only lay down the foundations of the theory of probability, but also considered the problem of points, presenting a solution essentially the same as Pascal’s. [4]

Neither Pascal nor Huygens used the term “expectation” in its modern sense. In particular, Huygens writes: “That my Chance or Expectation to win any thing is worth just such a Sum, as wou’d procure me in the same Chance and Expectation at a fair Lay. … If I expect a or b, and have an equal Chance of gaining them, my Expectation is worth \scriptstyle \frac{a+b}{2}.” More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract “Théorie analytique des probabilités”, where the concept of expected value was defined explicitly:

… This advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage mathematical hope.

The use of letter E to denote expected value goes back to W.A. Whitworth (1901) “Choice and chance”. The symbol has become popular since for English writers it meant “Expectation”, for Germans “Erwartungswert”, and for French “Espérance mathématique”.[5]

Examples

The expected outcome from one roll of an ordinary (that is, fair) six-sided die is


    \operatorname{E}(\text{roll with a 6-sided die}) = \Big(1 \times \frac16\Big) + \Big(2 \times \frac16\Big) + \Big(3 \times \frac16\Big) + \Big(4 \times \frac16\Big) + \Big(5 \times \frac16\Big) + \Big(6 \times \frac16\Big) = 3.5

which is not among the possible outcomes.[6]

A common application of expected value is gambling. For example, an American roulette wheel has 38 places where the ball may land, all equally likely. A winning bet on a single number pays 35-to-1, meaning that the original stake is not lost, and 35 times that amount is won, so you receive 36 times what you've bet. Considering all 38 possible outcomes, the expected value of the profit resulting from a dollar bet on a single number is the sum of potential net loss times the probability of losing and potential net gain times the probability of winning, that is,


     \operatorname{E}(\text{winnings from }$1\text{ bet}) = \left( -$1 \times \frac{37}{38} \right) + \left( $35 \times \frac{1}{38} \right) = -$0.052631579.

The net change in your financial holdings is −$1 when you lose, and $35 when you win. Thus one may expect, on average, to lose about five cents for every dollar bet, and the expected value of a one-dollar bet is $0.947368421. In gambling, an event of which the expected value equals the stake (i.e. the bettor's expected profit, or net gain, is zero) is called a “fair game”.

Mathematical definition

In general, if X\, is a random variable defined on a probability space (\Omega, \Sigma, P)\,, then the expected value of X\,, denoted by \operatorname{E}(X)\,, \langle X \rangle, \bar{X} or \mathbb{E}(X), is defined as

\operatorname{E}(X) = \int_\Omega X\, \operatorname{d}P

When this integral converges absolutely,it is called the expectation of X.The absolute convergence is necessary because conditional convergence means that different order of addition gives different result,which is against the nature of expected value. Here the Lebesgue integral is employed. Note that not all random variables have an expected value, since the integral may not converge absolutely (e.g., Cauchy distribution). Two variables with the same probability distribution will have the same expected value, if it is defined.

If X is a discrete random variable with probability mass function p(x), then the expected value becomes

\operatorname{E}(X) = \sum_i x_i p(x_i) \,

as in the gambling example mentioned above.

If the probability distribution of X admits a probability density function f(x), then the expected value can be computed as

\operatorname{E}(X) = \int_{-\infty}^\infty x f(x)\, \operatorname{d}x .

It follows directly from the discrete case definition that if X is a constant random variable, i.e. X = b for some fixed real number b, then the expected value of X is also b.

The expected value of an arbitrary function of X, g(X), with respect to the probability density function f(x) is given by the inner product of f and g:

\operatorname{E}(g(X)) = \int_{-\infty}^\infty g(x) f(x)\, \operatorname{d}x .

This is sometimes called the law of the unconscious statistician. Using representations as Riemann–Stieltjes integral and integration by parts the formula can be restated as

As a special case let  \alpha denote a positive real number, then

 \operatorname{E}(\left|X \right|^\alpha) = \alpha \int_{0}^{\infty} t^{\alpha -1}\operatorname{P}(\left|X \right|>t) \, \operatorname{d}t.

In particular, for  \alpha=1, this reduces to:

 \operatorname{E}(|X|) = \int_{0}^{\infty} \lbrace 1-F(t) \rbrace \, \operatorname{d}t,

if P[X \ge 0]=1, where F is the cumulative distribution function of X.

Conventional terminology

2e-83

Properties

Constants

The expected value of a constant is equal to the constant itself; i.e., if c is a constant, then  \operatorname{E}(c) = c.

Monotonicity

If X and Y are random variables so that X \le Y almost surely, then  \operatorname{E}(X) \le \operatorname{E}(Y).

Linearity

The expected value operator (or expectation operator) \operatorname{E} is linear in the sense that

\operatorname{E}(X + c)=  \operatorname{E}(X) + c\,
\operatorname{E}(X + Y)=  \operatorname{E}(X) + \operatorname{E}(Y)\,
\operatorname{E}(aX)= a \operatorname{E}(X)\,

Note that the second result is valid even if X is not statistically independent of Y. Combining the results from previous three equations, we can see that

\operatorname{E}(aX + b)= a \operatorname{E}(X) + b\,
\operatorname{E}(a X + b Y) = a \operatorname{E}(X) + b \operatorname{E}(Y)\,

for any two random variables X and Y (which need to be defined on the same probability space) and any real numbers a and b.

Iterated expectation

Iterated expectation for discrete random variables

For any two discrete random variables X, Y one may define the conditional expectation:[7]

 \operatorname{E}(X|Y)(y) = \operatorname{E}(X|Y=y) = \sum\limits_x x \cdot \operatorname{P}(X=x|Y=y).

which means that \operatorname{E}(X|Y)(y) is a function on y.

Then the expectation of X satisfies


\operatorname{E} \left( \operatorname{E}(X|Y) \right)= \sum\limits_y \operatorname{E}(X|Y=y) \cdot \operatorname{P}(Y=y)  \,
=\sum\limits_y \left( \sum\limits_x x \cdot \operatorname{P}(X=x|Y=y) \right) \cdot \operatorname{P}(Y=y)\,
=\sum\limits_y \sum\limits_x x \cdot \operatorname{P}(X=x|Y=y) \cdot \operatorname{P}(Y=y)\,
=\sum\limits_y \sum\limits_x x \cdot \operatorname{P}(Y=y|X=x) \cdot \operatorname{P}(X=x) \,
=\sum\limits_x x \cdot \operatorname{P}(X=x) \cdot \left( \sum\limits_y \operatorname{P}(Y=y|X=x) \right) \,
=\sum\limits_x x \cdot \operatorname{P}(X=x) \,
=\operatorname{E}(X).\,

Hence, the following equation holds:[8]

\operatorname{E}(X) = \operatorname{E} \left( \operatorname{E}(X|Y) \right).

The right hand side of this equation is referred to as the iterated expectation and is also sometimes called the tower rule. This proposition is treated in law of total expectation.

Iterated expectation for continuous random variables

In the continuous case, the results are completely analogous. The definition of conditional expectation would use inequalities, density functions, and integrals to replace equalities, mass functions, and summations, respectively. However, the main result still holds:

\operatorname{E}(X) = \operatorname{E} \left( \operatorname{E}(X|Y) \right).

Inequality

If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or equal to that of Y:

If  X \leq Y, then  \operatorname{E}(X) \leq \operatorname{E}(Y).

In particular, since  X \leq |X| and  -X \leq |X| , the absolute value of expectation of a random variable is less than or equal to the expectation of its absolute value:

|\operatorname{E}(X)| \leq \operatorname{E}(|X|)

Non-multiplicativity

In general, the expected value operator is not multiplicative, i.e. \operatorname{E}(X Y) is not necessarily equal to \operatorname{E}(X) \operatorname{E}(Y). If multiplicativity occurs, the X and Y variables are said to be uncorrelated (independent variables are a notable case of uncorrelated variables). The lack of multiplicativity gives rise to study of covariance and correlation.

If one considers the joint PDF of X and Y, say j(X,Y), then the expectation of XY is


\operatorname{E}(XY)=\int_{X,Y}XYj(X,Y)\,dX\,dY.

Now if X and Y are independent, then by definition j(X,Y)=f(X)g(Y) where f and g are the marginal PDFs for X and Y. Then


\begin{align}
\operatorname{E}(XY) & = \int_{X,Y}XYf(X)g(Y)\,dX\,dY=
\int_X\int_Y XYf(X)g(Y)\,dX\,dY \\
& = \int_X Xf(X)\left[\int Yg(Y)\,dY\right]\,dX=\int_XXf(X)\operatorname{E}(Y)\,dX=\operatorname{E}(X)\operatorname{E}(Y)
\end{align}

Observe that independence of X and Y is required only to write j(X,Y)=f(X)g(Y), and this is required to establish the third equality above.

Functional non-invariance

In general, the expectation operator and functions of random variables do not commute; that is

\operatorname{E}(g(X)) = \int_{\Omega} g(X)\, \operatorname{d}P \neq g(\operatorname{E}(X)),

A notable inequality concerning this topic is Jensen's inequality, involving expected values of convex (or concave) functions.

Uses and applications

The expected values of the powers of X are called the moments of X; the moments about the mean of X are expected values of powers of X - \operatorname{E}(X). The moments of some random variables can be used to specify their distributions, via their moment generating functions.

To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate gets smaller.

In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose X is a discrete random variable with values x_i and corresponding probabilities p_i. Now consider a weightless rod on which are placed weights, at locations x_i along the rod and having masses p_i (whose sum is one). The point at which the rod balances is \operatorname{E}(X).

Expected values can also be used to compute the variance, by means of the computational formula for the variance

\operatorname{Var}(X)=  \operatorname{E}(X^2) - (\operatorname{E}(X))^2.

A very important application of the expectation value is in the field of quantum mechanics. The expectation value of a quantum mechanical operator \hat{A} operating on a quantum state vector |\psi\rangle is written as \langle\hat{A}\rangle = \langle\psi|A|\psi\rangle. The uncertainty in \hat{A} can be calculated using the formula (\Delta A)^2 = \langle\hat{A}^2\rangle - \langle\hat{A}\rangle^2.

Expectation of matrices

If X is an m \times n matrix, then the expected value of the matrix is defined as the matrix of expected values:


\operatorname{E}(X)
=
\operatorname{E}
\begin{pmatrix}
 x_{1,1} & x_{1,2} & \cdots & x_{1,n} \\
 x_{2,1} & x_{2,2} & \cdots & x_{2,n} \\
 \vdots  & \vdots  & \ddots & \vdots  \\
 x_{m,1} & x_{m,2} & \cdots & x_{m,n}
\end{pmatrix}
=
\begin{pmatrix}
 \operatorname{E}(x_{1,1}) & \operatorname{E}(x_{1,2}) & \cdots & \operatorname{E}(x_{1,n}) \\
 \operatorname{E}(x_{2,1}) & \operatorname{E}(x_{2,2}) & \cdots & \operatorname{E}(x_{2,n}) \\
 \vdots                    & \vdots                    & \ddots & \vdots \\
 \operatorname{E}(x_{m,1}) & \operatorname{E}(x_{m,2}) & \cdots & \operatorname{E}(x_{m,n})
\end{pmatrix}.

This is utilized in covariance matrices.

Formulas for special cases

Discrete distribution taking only non-negative integer values

When a random variable takes only values in \{0,1,2,3,...\} we can use the following formula for computing its expectation:


\operatorname{E}(X)=\sum\limits_{i=1}^\infty P(X\geq i).

Proof:


\begin{align}
\sum\limits_{i=1}^\infty P(X\geq i)&=\sum\limits_{i=1}^\infty \sum\limits_{j=i}^\infty P(X = j)
\end{align}

interchanging the order of summation, we have


\begin{align}
\sum\limits_{i=1}^\infty P(X\geq i)&=\sum\limits_{j=1}^\infty \sum\limits_{i=1}^j P(X = j)\\
                   &=\sum\limits_{j=1}^\infty j\, P(X = j)\\
                   &=\operatorname{E}(X)
\end{align}

as claimed. This result can be a useful computational shortcut. For example, suppose we toss a coin where the probability of heads is p. How many tosses can we expect until the first heads (not including the heads itself)? Let X be this number. Note that we are counting only the tails and not the heads which ends the experiment; in particular, we can have X=0. The expectation of X may be computed by  \sum_{i= 0}^\infty (1-p)^i=\frac{1}{p} . This is because the number of tosses is at least i exactly when the first i tosses yielded tails. This matches the expectation of a random variable with an Exponential distribution. We used the formula for Geometric progression: 
\sum_{k=1}^\infty r^k=\frac{r}{1-r}.

Continuous distribution taking non-negative values

Analogously with the discrete case above, when a continuous random variable X takes only non-negative values, we can use the following formula for computing its expectation:


\operatorname{E}(X)=\int_0^\infty P(X \ge x)\; dx

Proof: It is first assumed that X has a density f_X(t).


\begin{align}
\int_0^\infty P(X\ge x)\;dx &=\int_0^\infty \int_x^\infty f_X(t)\;dt\;dx
\end{align}

interchanging the order of integration, we have


\begin{align}
\int_0^\infty P(X\ge x)\;dx &= \int_0^\infty \int_0^t f_X(t)\;dx\;dt \\
                            &= \int_0^\infty t f_X(t)\;dt\\
                   &=\operatorname{E}(X)
\end{align}

as claimed. In case no density exists, it is seen that 
\begin{align}
\operatorname{E}(X) = \int_0^\infty \int_0^x \;dt\; dF(x) = \int_0^\infty \int_t^\infty dF(x) \;dt = \int_0^\infty 1-F(x) \;dx.
\end{align}

See also

Notes

  1. Sheldon M Ross (2007). "§2.4 Expectation of a random variable". Introduction to probability models (9th ed.). Academic Press. p. 38 ff. ISBN 0125980620. http://books.google.com/books?id=12Pk5zZFirEC&pg=PA38. 
  2. Richard W Hamming (1991). "§2.5 Random variables, mean and the expected value". The art of probability for scientists and engineers. Addison-Wesley. p. 64 ff. ISBN 0201406861. http://books.google.com/books?id=jX_F-77TA3gC&pg=PA64. 
  3. For a discussion of the Cauchy distribution, see Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution". The art of probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0201406861. http://books.google.com/books?id=jX_F-77TA3gC&printsec=frontcover&dq=isbn:0201406861&cd=1#v=onepage&q=Cauchy&f=false. "Sampling from the Cauchy distribution and averaging gets you nowhere – one sample has the same distribution as the average of 1000 samples!" 
  4. In the foreword to his book, Huygens writes: “It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.” (cited in Edwards (2002)). Thus, Huygens learned about de Méré’s problem in 1655 during his visit to France; later on in 1656 from his correspondence with Carcavi he learned that his method was essentially the same as Pascal’s; so that before his book went to press in 1657 he knew about Pascal’s priority in this subject.
  5. "Earliest uses of symbols in probability and statistics". http://jeff560.tripod.com/stat.html. 
  6. Sheldon M Ross. "Example 2.15". cited work. p. 39. ISBN 0125980620. http://books.google.com/books?id=12Pk5zZFirEC&pg=PA39. 
  7. Sheldon M Ross. "Chapter 3: Conditional probability and conditional expectation". cited work. p. 97 ff. ISBN 0125980620. http://books.google.com/books?id=12Pk5zZFirEC&pg=PA97. 
  8. Sheldon M Ross. "§3.4: Computing expectations by conditioning". cited work. p. 105 ff. ISBN 0125980620. http://books.google.com/books?id=12Pk5zZFirEC&pg=PA105. 

Historical background

  • Edwards, A.W.F (2002). Pascal’s arithmetical triangle: the story of a mathematical idea (2nd ed.). JHU Press. ISBN 0-8018-6946-3. 
  • Huygens, Christiaan (1657). De ratiociniis in ludo aleæ (English translation, published in 1714: [1]). 

External links